Efficient query processing in managed runtimes
نویسنده
چکیده
This thesis presents strategies to improve the query evaluation performance over huge volumes of relational-like data that is stored in the memory space of managed applications. Storing and processing application data in the memory space of managed applications is motivated by the convergence of two recent trends in data management. First, dropping DRAM prices have led to memory capacities that allow the entire working set of an application to fit into main memory and to the emergence of in-memory database systems (IMDBs). Second, language-integrated query transparently integrates query processing syntax into programming languages and, therefore, allows complex queries to be composed in the application. IMDBs typically serve as data stores to applications written in an object-oriented language running on a managed runtime. In this thesis, we propose a deeper integration of the two by storing all application data in the memory space of the application and using language-integrated query, combined with query compilation techniques, to provide fast query processing. As a starting point, we look into storing data as runtime-managed objects in collection types provided by the programming language. Queries are formulated using language-integrated query and dynamically compiled to specialized functions that produce the result of the query in a more efficient way by leveraging query compilation techniques similar to those used in modern database systems. We show that the generated query functions significantly improve query processing performance compared to the default execution model for language-integrated query. However, we also identify additional inefficiencies that can only be addressed by processing queries using lowlevel techniques which cannot be applied to runtime-managed objects. To address this, we introduce a staging phase in the generated code that makes query-relevant managed data accessible to low-level query code. Our experiments in .NET show an improvement in query evaluation performance of up to an order of magnitude over the default language-integrated query implementation. Motivated by additional inefficiencies caused by automatic garbage collection, we introduce a new collection type, the black-box collection. Black-box collections integrate the in-memory storage layer of a relational database system to store data and hide the internal storage layout from the application by employing existing object-relational mapping techniques (hence, the name black-box). Our experiments show that blackbox collections provide better query performance than runtime-managed collections by allowing the generated query code to directly access the underlying relational inmemory data store using low-level techniques. Black-box collections also outperform
منابع مشابه
Code Generation for Efficient Query Processing in Managed Runtimes
In this paper we examine opportunities arising from the convergence of two trends in data management: in-memory database systems (IMDBs), which have received renewed attention following the availability of affordable, very large main memory systems; and language-integrated query, which transparently integrates database queries with programming languages (thus addressing the famous ‘impedance mi...
متن کاملDeclarative Query Processing in Imperative Managed Runtimes
The falling price of main memory has led to the development and growth of in-memory databases. At the same time, new advances in memory technology, like persistent memory, make it possible to have a truly universal storage model, accessed directly through the programming language in the context of a fully managed runtime. This environment is further enhanced by language-integrated query, which ...
متن کاملLazyTainter : Memory - Efficient Taint Tracking in Managed Runtimes
LazyTainter : Memory-Efficient Taint Tracking in Managed Runtimes Zheng Wei Master of Science Graduate Department of Computer Science University of Toronto 2014 The leakage of private information is of great concern on mobile devices since they contain a great deal of sensitive information. This has spurred interest in the use of taint tracking systems to track and monitor the flow of private i...
متن کاملEEQR: An Energy Efficient Query-Based Routing Protocol for Wireless Sensor Networks
Routing in Wireless Sensor Networks (WSNs) is a very challenging task due to the large number of nodes, their mobility and lack of proper infrastructure. Since the sensors are battery powered devices, energy efficiency is considered as one of the main factors in designing routing protocols in WSNs. Most of energy-aware routing protocols are mere energy savers that attempt to decrease the energy...
متن کاملAn Effective Path-aware Approach for Keyword Search over Data Graphs
Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...
متن کامل